An improved method for density-based clustering

نویسندگان

  • Hong Jin
  • Shuliang Wang
  • Qian Zhou
  • Ying Li
چکیده

Knowledge discovery in large multimedia databases which usually contain large amounts of noise and high-dimensional feature vectors is an increasingly important research issue. Density-based clustering is proved to be much more efficient when dealing with such databases. However, its clustering quality mainly depends on the parameter setting. For the adequate choice of the parameters to be preset, it has difficulty in its operability without enough domain knowledge. To solve such problem, in this paper it proposed a new approach to immediately inference an appropriate value for one of the parameters named bandwidth. Based on the Bayesian Theorem, it is to infer the suitable parameter value by the constructed parameter estimation model. Then the user only has to preset the other parameter noise threshold. As a result, the clusters can be identified by the determined parameter values. The experimental results show that the proposed method has complementary advantages in the density-based clustering algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved opposition-based Crow Search Algorithm for Data Clustering

Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

An Improved Pixon-Based Approach for Image Segmentation

An improved pixon-based method is proposed in this paper for image segmentation. In thisapproach, a wavelet thresholding technique is initially applied on the image to reduce noise and toslightly smooth the image. This technique causes an image not to be oversegmented when the pixonbasedmethod is used. Indeed, the wavelet thresholding, as a pre-processing step, eliminates theunnecessary details...

متن کامل

A density based clustering approach to distinguish between web robot and human requests to a web server

Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJDMMM

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2014